Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 41
Filter
1.
Genome Biol ; 25(1): 100, 2024 Apr 19.
Article in English | MEDLINE | ID: mdl-38641812

ABSTRACT

Multiplexed assays of variant effect (MAVEs) have emerged as a powerful approach for interrogating thousands of genetic variants in a single experiment. The flexibility and widespread adoption of these techniques across diverse disciplines have led to a heterogeneous mix of data formats and descriptions, which complicates the downstream use of the resulting datasets. To address these issues and promote reproducibility and reuse of MAVE data, we define a set of minimum information standards for MAVE data and metadata and outline a controlled vocabulary aligned with established biomedical ontologies for describing these experimental designs.


Subject(s)
Metadata , Research Design , Reproducibility of Results
2.
Nucleic Acids Res ; 52(D1): D1227-D1235, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37953380

ABSTRACT

The Drug-Gene Interaction Database (DGIdb, https://dgidb.org) is a publicly accessible resource that aggregates genes or gene products, drugs and drug-gene interaction records to drive hypothesis generation and discovery for clinicians and researchers. DGIdb 5.0 is the latest release and includes substantial architectural and functional updates to support integration into clinical and drug discovery pipelines. The DGIdb service architecture has been split into separate client and server applications, enabling consistent data access for users of both the application programming interface (API) and web interface. The new interface was developed in ReactJS, and includes dynamic visualizations and consistency in the display of user interface elements. A GraphQL API has been added to support customizable queries for all drugs, genes, annotations and associated data. Updated documentation provides users with example queries and detailed usage instructions for these new features. In addition, six sources have been added and many existing sources have been updated. Newly added sources include ChemIDplus, HemOnc, NCIt (National Cancer Institute Thesaurus), Drugs@FDA, HGNC (HUGO Gene Nomenclature Committee) and RxNorm. These new sources have been incorporated into DGIdb to provide additional records and enhance annotations of regulatory approval status for therapeutics. Methods for grouping drugs and genes have been expanded upon and developed as independent modular normalizers during import. The updates to these sources and grouping methods have resulted in an improvement in FAIR (findability, accessibility, interoperability and reusability) data representation in DGIdb.


Subject(s)
Precision Medicine , Humans , Databases, Pharmaceutical , Drug Discovery , Internet , User-Computer Interface , Vocabulary, Controlled
3.
JAMIA Open ; 6(4): ooad093, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37954974

ABSTRACT

Objective: The diversity of nomenclature and naming strategies makes therapeutic terminology difficult to manage and harmonize. As the number and complexity of available therapeutic ontologies continues to increase, the need for harmonized cross-resource mappings is becoming increasingly apparent. This study creates harmonized concept mappings that enable the linking together of like-concepts despite source-dependent differences in data structure or semantic representation. Materials and Methods: For this study, we created Thera-Py, a Python package and web API that constructs searchable concepts for drugs and therapeutic terminologies using 9 public resources and thesauri. By using a directed graph approach, Thera-Py captures commonly used aliases, trade names, annotations, and associations for any given therapeutic and combines them under a single concept record. Results: We highlight the creation of 16 069 unique merged therapeutic concepts from 9 distinct sources using Thera-Py and observe an increase in overlap of therapeutic concepts in 2 or more knowledge bases after harmonization using Thera-Py (9.8%-41.8%). Conclusion: We observe that Thera-Py tends to normalize therapeutic concepts to their underlying active ingredients (excluding nondrug therapeutics, eg, radiation therapy, biologics), and unifies all available descriptors regardless of ontological origin.

4.
ArXiv ; 2023 Jun 26.
Article in English | MEDLINE | ID: mdl-37426450

ABSTRACT

Multiplexed Assays of Variant Effect (MAVEs) have emerged as a powerful approach for interrogating thousands of genetic variants in a single experiment. The flexibility and widespread adoption of these techniques across diverse disciplines has led to a heterogeneous mix of data formats and descriptions, which complicates the downstream use of the resulting datasets. To address these issues and promote reproducibility and reuse of MAVE data, we define a set of minimum information standards for MAVE data and metadata and outline a controlled vocabulary aligned with established biomedical ontologies for describing these experimental designs.

5.
PLoS One ; 18(5): e0285433, 2023.
Article in English | MEDLINE | ID: mdl-37196000

ABSTRACT

The Global Alliance for Genomics and Health (GA4GH) is a standards-setting organization that is developing a suite of coordinated standards for genomics. The GA4GH Phenopacket Schema is a standard for sharing disease and phenotype information that characterizes an individual person or biosample. The Phenopacket Schema is flexible and can represent clinical data for any kind of human disease including rare disease, complex disease, and cancer. It also allows consortia or databases to apply additional constraints to ensure uniform data collection for specific goals. We present phenopacket-tools, an open-source Java library and command-line application for construction, conversion, and validation of phenopackets. Phenopacket-tools simplifies construction of phenopackets by providing concise builders, programmatic shortcuts, and predefined building blocks (ontology classes) for concepts such as anatomical organs, age of onset, biospecimen type, and clinical modifiers. Phenopacket-tools can be used to validate the syntax and semantics of phenopackets as well as to assess adherence to additional user-defined requirements. The documentation includes examples showing how to use the Java library and the command-line tool to create and validate phenopackets. We demonstrate how to create, convert, and validate phenopackets using the library or the command-line application. Source code, API documentation, comprehensive user guide and a tutorial can be found at https://github.com/phenopackets/phenopacket-tools. The library can be installed from the public Maven Central artifact repository and the application is available as a standalone archive. The phenopacket-tools library helps developers implement and standardize the collection and exchange of phenotypic and other clinical data for use in phenotype-driven genomic diagnostics, translational research, and precision medicine applications.


Subject(s)
Neoplasms , Software , Humans , Genomics , Databases, Factual , Gene Library
6.
Adv Genet (Hoboken) ; 4(1): 2200016, 2023 Mar.
Article in English | MEDLINE | ID: mdl-36910590

ABSTRACT

The Global Alliance for Genomics and Health (GA4GH) is developing a suite of coordinated standards for genomics for healthcare. The Phenopacket is a new GA4GH standard for sharing disease and phenotype information that characterizes an individual person, linking that individual to detailed phenotypic descriptions, genetic information, diagnoses, and treatments. A detailed example is presented that illustrates how to use the schema to represent the clinical course of a patient with retinoblastoma, including demographic information, the clinical diagnosis, phenotypic features and clinical measurements, an examination of the extirpated tumor, therapies, and the results of genomic analysis. The Phenopacket Schema, together with other GA4GH data and technical standards, will enable data exchange and provide a foundation for the computational analysis of disease and phenotype information to improve our ability to diagnose and conduct research on all types of disorders, including cancer and rare diseases.

7.
Pac Symp Biocomput ; 28: 383-394, 2023.
Article in English | MEDLINE | ID: mdl-36540993

ABSTRACT

As the diversity of genomic variation data increases with our growing understanding of the role of variation in health and disease, it is critical to develop standards for precise inter-system exchange of these data for research and clinical applications. The Global Alliance for Genomics and Health (GA4GH) Variation Representation Specification (VRS) meets this need through a technical terminology and information model for disambiguating and concisely representing variation concepts. Here we discuss the recent Genotype model in VRS, which may be used to represent the allelic composition of a genetic locus. We demonstrate the use of the Genotype model and the constituent Haplotype model for the precise and interoperable representation of pharmacogenomic diplotypes, HGVS variants, and VCF records using VRS and discuss how this can be leveraged to enable interoperable exchange and search operations between assayed variation and genomic knowledgebases.


Subject(s)
Computational Biology , Genetic Variation , Humans , Databases, Genetic , Genomics , Genotype
8.
Pac Symp Biocomput ; 28: 531-535, 2023.
Article in English | MEDLINE | ID: mdl-36541006

ABSTRACT

The Clinical Genome Resource (ClinGen) serves as an authoritative resource on the clinical relevance of genes and variants. In order to support our curation activities and to disseminate our findings to the community, we have developed a Data Platform of informatics resources backed by standardized data models. In this workshop we demonstrate our publicly available resources including curation interfaces, (Variant Curation Interface, CIViC), supporting infrastructure (Allele Registry, Genegraph), and data models (SEPIO, GA4GH VRS, VA).


Subject(s)
Computational Biology , Genetic Variation , Humans , Databases, Genetic , Genome, Human , Genomics
9.
Bioinformatics ; 38(23): 5279-5287, 2022 11 30.
Article in English | MEDLINE | ID: mdl-36222570

ABSTRACT

MOTIVATION: Despite the increasing evidence of utility of genomic medicine in clinical practice, systematically integrating genomic medicine information and knowledge into clinical systems with a high-level of consistency, scalability and computability remains challenging. A comprehensive terminology is required for relevant concepts and the associated knowledge model for representing relationships. In this study, we leveraged PharmGKB, a comprehensive pharmacogenomics (PGx) knowledgebase, to formulate a terminology for drug response phenotypes that can represent relationships between genetic variants and treatments. We evaluated coverage of the terminology through manual review of a randomly selected subset of 200 sentences extracted from genetic reports that contained concepts for 'Genes and Gene Products' and 'Treatments'. RESULTS: Results showed that our proposed drug response phenotype terminology could cover 96% of the drug response phenotypes in genetic reports. Among 18 653 sentences that contained both 'Genes and Gene Products' and 'Treatments', 3011 sentences were able to be mapped to a drug response phenotype in our proposed terminology, among which the most discussed drug response phenotypes were response (994), sensitivity (829) and survival (332). In addition, we were able to re-analyze genetic report context incorporating the proposed terminology and enrich our previously proposed PGx knowledge model to reveal relationships between genetic variants and treatments. In conclusion, we proposed a drug response phenotype terminology that enhanced structured knowledge representation of genomic medicine. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genomic Medicine , Pharmacogenetics , Pharmacogenetics/methods , Knowledge Bases , Phenotype
11.
Br J Cancer ; 127(8): 1540-1549, 2022 11.
Article in English | MEDLINE | ID: mdl-35871236

ABSTRACT

BACKGROUND: Cholangiocarcinoma (CCA) is a primary malignancy of the biliary tract with a dismal prognosis. Recently, several actionable genetic aberrations were identified with significant enrichment in intrahepatic CCA, including FGFR2 gene fusions with a prevalence of 10-15%. Recent clinical data demonstrate that these fusions are druggable in a second-line setting in advanced/metastatic disease and the efficacy in earlier lines of therapy is being evaluated in ongoing clinical trials. This scenario warrants standardised molecular profiling of these tumours. METHODS: A detailed analysis of the original genetic data from the FIGHT-202 trial, on which the approval of Pemigatinib was based, was conducted. RESULTS: Comparing different detection approaches and displaying representative cases, we described the genetic landscape and architecture of FGFR2 fusions in iCCA and show biological and technical aspects to be considered for their detection. We elaborated parameters, including a suggestion for annotation, that should be stated in a molecular diagnostic FGFR2 report to allow a complete understanding of the analysis performed and the information provided. CONCLUSION: This study provides a detailed presentation and dissection of the technical and biological aspects regarding FGFR2 fusion detection, which aims to support molecular pathologists, pathologists and clinicians in diagnostics, reporting of the results and decision-making.


Subject(s)
Bile Duct Neoplasms , Cholangiocarcinoma , Bile Duct Neoplasms/drug therapy , Bile Ducts, Intrahepatic/pathology , Cholangiocarcinoma/drug therapy , Genomics , Humans , Molecular Diagnostic Techniques , Receptor, Fibroblast Growth Factor, Type 2/genetics
14.
Database (Oxford) ; 20222022 05 25.
Article in English | MEDLINE | ID: mdl-35616100

ABSTRACT

Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec.


Subject(s)
Metadata , Semantic Web , Data Management , Databases, Factual , Workflow
15.
Genet Med ; 24(5): 986-998, 2022 05.
Article in English | MEDLINE | ID: mdl-35101336

ABSTRACT

PURPOSE: Several professional societies have published guidelines for the clinical interpretation of somatic variants, which specifically address diagnostic, prognostic, and therapeutic implications. Although these guidelines for the clinical interpretation of variants include data types that may be used to determine the oncogenicity of a variant (eg, population frequency, functional, and in silico data or somatic frequency), they do not provide a direct, systematic, and comprehensive set of standards and rules to classify the oncogenicity of a somatic variant. This insufficient guidance leads to inconsistent classification of rare somatic variants in cancer, generates variability in their clinical interpretation, and, importantly, affects patient care. Therefore, it is essential to address this unmet need. METHODS: Clinical Genome Resource (ClinGen) Somatic Cancer Clinical Domain Working Group and ClinGen Germline/Somatic Variant Subcommittee, the Cancer Genomics Consortium, and the Variant Interpretation for Cancer Consortium used a consensus approach to develop a standard operating procedure (SOP) for the classification of oncogenicity of somatic variants. RESULTS: This comprehensive SOP has been developed to improve consistency in somatic variant classification and has been validated on 94 somatic variants in 10 common cancer-related genes. CONCLUSION: The comprehensive SOP is now available for classification of oncogenicity of somatic variants.


Subject(s)
Genome, Human , Neoplasms , Genetic Testing/methods , Genetic Variation/genetics , Genome, Human/genetics , Genomics/methods , Humans , Neoplasms/genetics , Virulence
16.
Semin Cancer Biol ; 84: 129-143, 2022 09.
Article in English | MEDLINE | ID: mdl-33631297

ABSTRACT

The complexity of diagnostic (surgical) pathology has increased substantially over the last decades with respect to histomorphological and molecular profiling. Pathology has steadily expanded its role in tumor diagnostics and beyond from disease entity identification via prognosis estimation to precision therapy prediction. It is therefore not surprising that pathology is among the disciplines in medicine with high expectations in the application of artificial intelligence (AI) or machine learning approaches given their capabilities to analyze complex data in a quantitative and standardized manner to further enhance scope and precision of diagnostics. While an obvious application is the analysis of histological images, recent applications for the analysis of molecular profiling data from different sources and clinical data support the notion that AI will enhance both histopathology and molecular pathology in the future. At the same time, current literature should not be misunderstood in a way that pathologists will likely be replaced by AI applications in the foreseeable future. Although AI will transform pathology in the coming years, recent studies reporting AI algorithms to diagnose cancer or predict certain molecular properties deal with relatively simple diagnostic problems that fall short of the diagnostic complexity pathologists face in clinical routine. Here, we review the pertinent literature of AI methods and their applications to pathology, and put the current achievements and what can be expected in the future in the context of the requirements for research and routine diagnostics.


Subject(s)
Artificial Intelligence , Neoplasms , Humans , Machine Learning , Neoplasms/diagnosis , Neoplasms/genetics , Prognosis
18.
BMC Genomics ; 22(1): 872, 2021 Dec 04.
Article in English | MEDLINE | ID: mdl-34863095

ABSTRACT

BACKGROUND: Pediatric cancers typically have a distinct genomic landscape when compared to adult cancers and frequently carry somatic gene fusion events that alter gene expression and drive tumorigenesis. Sensitive and specific detection of gene fusions through the analysis of next-generation-based RNA sequencing (RNA-Seq) data is computationally challenging and may be confounded by low tumor cellularity or underlying genomic complexity. Furthermore, numerous computational tools are available to identify fusions from supporting RNA-Seq reads, yet each algorithm demonstrates unique variability in sensitivity and precision, and no clearly superior approach currently exists. To overcome these challenges, we have developed an ensemble fusion calling approach to increase the accuracy of identifying fusions. RESULTS: Our Ensemble Fusion (EnFusion) approach utilizes seven fusion calling algorithms: Arriba, CICERO, FusionMap, FusionCatcher, JAFFA, MapSplice, and STAR-Fusion, which are packaged as a fully automated pipeline using Docker and Amazon Web Services (AWS) serverless technology. This method uses paired end RNA-Seq sequence reads as input, and the output from each algorithm is examined to identify fusions detected by a consensus of at least three algorithms. These consensus fusion results are filtered by comparison to an internal database to remove likely artifactual fusions occurring at high frequencies in our internal cohort, while a "known fusion list" prevents failure to report known pathogenic events. We have employed the EnFusion pipeline on RNA-Seq data from 229 patients with pediatric cancer or blood disorders studied under an IRB-approved protocol. The samples consist of 138 central nervous system tumors, 73 solid tumors, and 18 hematologic malignancies or disorders. The combination of an ensemble fusion-calling pipeline and a knowledge-based filtering strategy identified 67 clinically relevant fusions among our cohort (diagnostic yield of 29.3%), including RBPMS-MET, BCAN-NTRK1, and TRIM22-BRAF fusions. Following clinical confirmation and reporting in the patient's medical record, both known and novel fusions provided medically meaningful information. CONCLUSIONS: The EnFusion pipeline offers a streamlined approach to discover fusions in cancer, at higher levels of sensitivity and accuracy than single algorithm methods. Furthermore, this method accurately identifies driver fusions in pediatric cancer, providing clinical impact by contributing evidence to diagnosis and, when appropriate, indicating targeted therapies.


Subject(s)
Genome , Neoplasms , Child , Genomics , Humans , Neoplasms/genetics , Sequence Analysis, DNA , Sequence Analysis, RNA
20.
Cell Genom ; 1(2)2021 Nov 10.
Article in English | MEDLINE | ID: mdl-35311178

ABSTRACT

Maximizing the personal, public, research, and clinical value of genomic information will require the reliable exchange of genetic variation data. We report here the Variation Representation Specification (VRS, pronounced "verse"), an extensible framework for the computable representation of variation that complements contemporary human-readable and flat file standards for genomic variation representation. VRS provides semantically precise representations of variation and leverages this design to enable federated identification of biomolecular variation with globally consistent and unique computed identifiers. The VRS framework includes a terminology and information model, machine-readable schema, data sharing conventions, and a reference implementation, each of which is intended to be broadly useful and freely available for community use. VRS was developed by a partnership among national information resource providers, public initiatives, and diagnostic testing laboratories under the auspices of the Global Alliance for Genomics and Health (GA4GH).

SELECTION OF CITATIONS
SEARCH DETAIL
...